Explainable artificial intelligence

Explainable AI (XAI), often overlapping with Interpretable AI, or Explainable Machine Learning (XML), either refers to an artificial intelligence (AI) system over which it is possible for humans to retain intellectual oversight, or refers to the methods to achieve this.^[1]^[2] The main focus is usually on the reasoning behind the decisions or predictions made by the AI^[3] which are made more understandable and transparent.^[4] XAI counters the "black box" tendency of machine learning, where even the AI's designers cannot explain why it arrived at a specific decision.^[5]^[6]

XAI hopes to help users of AI-powered systems perform more effectively by improving their understanding of how those systems reason.^[7] XAI may be an implementation of the social right to explanation.^[8] Even if there is no such legal right or regulatory requirement, XAI can improve the user experience of a product or service by helping end users trust that the AI is making good decisions. XAI aims to explain what has been done, what is being done, and what will be done next, and to unveil which information these actions are based on.^[9] This makes it possible to confirm existing knowledge, challenge existing knowledge, and generate new assumptions.^[10]

Machine learning (ML) algorithms used in AI can be categorized as white-box or black-box.^[11] White-box models provide results that are understandable to experts in the domain. Black-box models, on the other hand, are extremely hard to explain and may not be understood even by domain experts.^[12] XAI algorithms follow the three principles of transparency, interpretability, and explainability. A model is transparent "if the processes that extract model parameters from training data and generate labels from testing data can be described and motivated by the approach designer."^[13] Interpretability describes the possibility of comprehending the ML model and presenting the underlying basis for decision-making in a way that is understandable to humans.^[14]^[15]^[16] Explainability is a concept that is recognized as important, but a consensus definition is not yet available;^[13] one possibility is "the collection of features of the interpretable domain that have contributed, for a given example, to producing a decision (e.g., classification or regression)".^[17] If algorithms fulfill these principles, they provide a basis for justifying decisions, tracking them and thereby verifying them, improving the algorithms, and exploring new facts.^[18]

Sometimes it is also possible to achieve a high-accuracy result with white-box ML algorithms. These algorithms have an interpretable structure that can be used to explain predictions.^[19] Concept Bottleneck Models, which use concept-level abstractions to explain model reasoning, are examples of this and can be applied in both image^[20] and text^[21] prediction tasks. This is especially important in domains like medicine, defense, finance, and law, where it is crucial to understand decisions and build trust in the algorithms.^[9] Many researchers argue that, at least for supervised machine learning, the way forward is symbolic regression, where the algorithm searches the space of mathematical expressions to find the model that best fits a given dataset.^[22]^[23]^[24]

AI systems optimize behavior to satisfy a mathematically specified goal system chosen by the system designers, such as the command "maximize the accuracy of assessing how positive film reviews are in the test dataset." The AI may learn useful general rules from the test set, such as "reviews containing the word "horrible" are likely to be negative." However, it may also learn inappropriate rules, such as "reviews containing 'Daniel Day-Lewis' are usually positive"; such rules may be undesirable if they are likely to fail to generalize outside the training set, or if people consider the rule to be "cheating" or "unfair." A human can audit rules in an XAI to get an idea of how likely the system is to generalize to future real-world data outside the test set.^[25]

^ Longo, Luca; et al. (2024). "Explainable Artificial Intelligence (XAI) 2.0: A manifesto of open challenges and interdisciplinary research directions". Information Fusion. 106. doi:10.1016/j.inffus.2024.102301.
^ Mihály, Héder (2023). "Explainable AI: A Brief History of the Concept" (PDF). ERCIM News (134): 9–10.
^ Phillips, P. Jonathon; Hahn, Carina A.; Fontana, Peter C.; Yates, Amy N.; Greene, Kristen; Broniatowski, David A.; Przybocki, Mark A. (2021-09-29). "Four Principles of Explainable Artificial Intelligence". doi:10.6028/nist.ir.8312. {{cite journal}}: Cite journal requires |journal= (help)
^ Vilone, Giulia; Longo, Luca (2021). "Notions of explainability and evaluation approaches for explainable artificial intelligence". Information Fusion. December 2021 - Volume 76: 89–106. doi:10.1016/j.inffus.2021.05.009.
^ Castelvecchi, Davide (2016-10-06). "Can we open the black box of AI?". Nature. 538 (7623): 20–23. Bibcode:2016Natur.538...20C. doi:10.1038/538020a. ISSN 0028-0836. PMID 27708329. S2CID 4465871.
^ Sample, Ian (5 November 2017). "Computer says no: why making AIs fair, accountable and transparent is crucial". The Guardian. Retrieved 30 January 2018.
^ Alizadeh, Fatemeh (2021). "I Don't Know, Is AI Also Used in Airbags?: An Empirical Study of Folk Concepts and People's Expectations of Current and Future Artificial Intelligence". Icom. 20 (1): 3–17. doi:10.1515/icom-2021-0009. S2CID 233328352.
^ Edwards, Lilian; Veale, Michael (2017). "Slave to the Algorithm? Why a 'Right to an Explanation' Is Probably Not the Remedy You Are Looking For". Duke Law and Technology Review. 16: 18. SSRN 2972855.
^ ^a ^b Gunning, D.; Stefik, M.; Choi, J.; Miller, T.; Stumpf, S.; Yang, G.-Z. (2019-12-18). "XAI-Explainable artificial intelligence". Science Robotics. 4 (37): eaay7120. doi:10.1126/scirobotics.aay7120. ISSN 2470-9476. PMID 33137719.
^ Rieg, Thilo; Frick, Janek; Baumgartl, Hermann; Buettner, Ricardo (2020-12-17). "Demonstration of the potential of white-box machine learning approaches to gain insights from cardiovascular disease electrocardiograms". PLOS ONE. 15 (12): e0243615. Bibcode:2020PLoSO..1543615R. doi:10.1371/journal.pone.0243615. ISSN 1932-6203. PMC 7746264. PMID 33332440.
^ Vilone, Giulia; Longo, Luca (2021). "Classification of Explainable Artificial Intelligence Methods through Their Output Formats". Machine Learning and Knowledge Extraction. 3 (3): 615–661. doi:10.3390/make3030032.
^ Loyola-González, O. (2019). "Black-Box vs. White-Box: Understanding Their Advantages and Weaknesses From a Practical Point of View". IEEE Access. 7: 154096–154113. Bibcode:2019IEEEA...7o4096L. doi:10.1109/ACCESS.2019.2949286. ISSN 2169-3536.
^ ^a ^b Roscher, R.; Bohn, B.; Duarte, M. F.; Garcke, J. (2020). "Explainable Machine Learning for Scientific Insights and Discoveries". IEEE Access. 8: 42200–42216. arXiv:1905.08883. Bibcode:2020IEEEA...842200R. doi:10.1109/ACCESS.2020.2976199. ISSN 2169-3536.
^ Murdoch, W. James; Singh, Chandan; Kumbier, Karl; Abbasi-Asl, Reza; Yu, Bin (2019-01-14). "Interpretable machine learning: definitions, methods, and applications". Proceedings of the National Academy of Sciences of the United States of America. 116 (44): 22071–22080. arXiv:1901.04592. doi:10.1073/pnas.1900654116. PMC 6825274. PMID 31619572.
^ Lipton, Zachary C. (June 2018). "The Mythos of Model Interpretability: In machine learning, the concept of interpretability is both important and slippery". Queue. 16 (3): 31–57. doi:10.1145/3236386.3241340. ISSN 1542-7730.
^ "Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI". DeepAI. 2019-10-22. Retrieved 2021-01-13.
^ Montavon, Grégoire; Samek, Wojciech; Müller, Klaus-Robert (2018-02-01). "Methods for interpreting and understanding deep neural networks". Digital Signal Processing. 73: 1–15. arXiv:1706.07979. Bibcode:2018DSP....73....1M. doi:10.1016/j.dsp.2017.10.011. ISSN 1051-2004.
^ Adadi, A.; Berrada, M. (2018). "Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)". IEEE Access. 6: 52138–52160. Bibcode:2018IEEEA...652138A. doi:10.1109/ACCESS.2018.2870052. ISSN 2169-3536.
^ Rudin, Cynthia (2019). "Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead". Nature Machine Intelligence. 1 (5): 206–215. arXiv:1811.10154. doi:10.1038/s42256-019-0048-x. ISSN 2522-5839. PMC 9122117. PMID 35603010.
^ Koh, P. W.; Nguyen, T.; Tang, Y. S.; Mussmann, S.; Pierson, E.; Kim, B.; Liang, P. (November 2020). "Concept bottleneck models". International Conference on Machine Learning. PMLR. pp. 5338–5348.
^ Ludan, J. M.; Lyu, Q.; Yang, Y.; Dugan, L.; Yatskar, M.; Callison-Burch, C. (2023). "Interpretable-by-Design Text Classification with Iteratively Generated Concept Bottleneck". arXiv:2310.19660 [cs.CL].
^ Wenninger, Simon; Kaymakci, Can; Wiethe, Christian (2022). "Explainable long-term building energy consumption prediction using QLattice". Applied Energy. 308. Elsevier BV: 118300. Bibcode:2022ApEn..30818300W. doi:10.1016/j.apenergy.2021.118300. ISSN 0306-2619. S2CID 245428233.
^ Christiansen, Michael; Wilstrup, Casper; Hedley, Paula L. (2022). "Explainable "white-box" machine learning is the way forward in preeclampsia screening". American Journal of Obstetrics and Gynecology. 227 (5). Elsevier BV: 791. doi:10.1016/j.ajog.2022.06.057. ISSN 0002-9378. PMID 35779588. S2CID 250160871.
^ Wilstup, Casper; Cave, Chris (2021-01-15), Combining symbolic regression with the Cox proportional hazards model improves prediction of heart failure deaths, Cold Spring Harbor Laboratory, doi:10.1101/2021.01.15.21249874, S2CID 231609904
^ "How AI detectives are cracking open the black box of deep learning". Science. 5 July 2017. Retrieved 30 January 2018..

[1] Longo, Luca; et al. (2024). "Explainable Artificial Intelligence (XAI) 2.0: A manifesto of open challenges and interdisciplinary research directions". Information Fusion. 106. doi:10.1016/j.inffus.2024.102301.

[2] Mihály, Héder (2023). "Explainable AI: A Brief History of the Concept" (PDF). ERCIM News (134): 9–10.

[3] Phillips, P. Jonathon; Hahn, Carina A.; Fontana, Peter C.; Yates, Amy N.; Greene, Kristen; Broniatowski, David A.; Przybocki, Mark A. (2021-09-29). "Four Principles of Explainable Artificial Intelligence". doi:10.6028/nist.ir.8312. {{cite journal}}: Cite journal requires |journal= (help)

[4] Vilone, Giulia; Longo, Luca (2021). "Notions of explainability and evaluation approaches for explainable artificial intelligence". Information Fusion. December 2021 - Volume 76: 89–106. doi:10.1016/j.inffus.2021.05.009.

[5] Castelvecchi, Davide (2016-10-06). "Can we open the black box of AI?". Nature. 538 (7623): 20–23. Bibcode:2016Natur.538...20C. doi:10.1038/538020a. ISSN 0028-0836. PMID 27708329. S2CID 4465871.

[guardian-6] Sample, Ian (5 November 2017). "Computer says no: why making AIs fair, accountable and transparent is crucial". The Guardian. Retrieved 30 January 2018.

[7] Alizadeh, Fatemeh (2021). "I Don't Know, Is AI Also Used in Airbags?: An Empirical Study of Folk Concepts and People's Expectations of Current and Future Artificial Intelligence". Icom. 20 (1): 3–17. doi:10.1515/icom-2021-0009. S2CID 233328352.

[:0-8] Edwards, Lilian; Veale, Michael (2017). "Slave to the Algorithm? Why a 'Right to an Explanation' Is Probably Not the Remedy You Are Looking For". Duke Law and Technology Review. 16: 18. SSRN 2972855.

[:3-9] Gunning, D.; Stefik, M.; Choi, J.; Miller, T.; Stumpf, S.; Yang, G.-Z. (2019-12-18). "XAI-Explainable artificial intelligence". Science Robotics. 4 (37): eaay7120. doi:10.1126/scirobotics.aay7120. ISSN 2470-9476. PMID 33137719.

[10] Rieg, Thilo; Frick, Janek; Baumgartl, Hermann; Buettner, Ricardo (2020-12-17). "Demonstration of the potential of white-box machine learning approaches to gain insights from cardiovascular disease electrocardiograms". PLOS ONE. 15 (12): e0243615. Bibcode:2020PLoSO..1543615R. doi:10.1371/journal.pone.0243615. ISSN 1932-6203. PMC 7746264. PMID 33332440.

[11] Vilone, Giulia; Longo, Luca (2021). "Classification of Explainable Artificial Intelligence Methods through Their Output Formats". Machine Learning and Knowledge Extraction. 3 (3): 615–661. doi:10.3390/make3030032.

[12] Loyola-González, O. (2019). "Black-Box vs. White-Box: Understanding Their Advantages and Weaknesses From a Practical Point of View". IEEE Access. 7: 154096–154113. Bibcode:2019IEEEA...7o4096L. doi:10.1109/ACCESS.2019.2949286. ISSN 2169-3536.

[:4-13] Roscher, R.; Bohn, B.; Duarte, M. F.; Garcke, J. (2020). "Explainable Machine Learning for Scientific Insights and Discoveries". IEEE Access. 8: 42200–42216. arXiv:1905.08883. Bibcode:2020IEEEA...842200R. doi:10.1109/ACCESS.2020.2976199. ISSN 2169-3536.

[Interpretable_machine_learning:_def-14] Murdoch, W. James; Singh, Chandan; Kumbier, Karl; Abbasi-Asl, Reza; Yu, Bin (2019-01-14). "Interpretable machine learning: definitions, methods, and applications". Proceedings of the National Academy of Sciences of the United States of America. 116 (44): 22071–22080. arXiv:1901.04592. doi:10.1073/pnas.1900654116. PMC 6825274. PMID 31619572.

[Lipton_31–57-15] Lipton, Zachary C. (June 2018). "The Mythos of Model Interpretability: In machine learning, the concept of interpretability is both important and slippery". Queue. 16 (3): 31–57. doi:10.1145/3236386.3241340. ISSN 1542-7730.

[16] "Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI". DeepAI. 2019-10-22. Retrieved 2021-01-13.

[17] Montavon, Grégoire; Samek, Wojciech; Müller, Klaus-Robert (2018-02-01). "Methods for interpreting and understanding deep neural networks". Digital Signal Processing. 73: 1–15. arXiv:1706.07979. Bibcode:2018DSP....73....1M. doi:10.1016/j.dsp.2017.10.011. ISSN 1051-2004.

[18] Adadi, A.; Berrada, M. (2018). "Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)". IEEE Access. 6: 52138–52160. Bibcode:2018IEEEA...652138A. doi:10.1109/ACCESS.2018.2870052. ISSN 2169-3536.

[:6-19] Rudin, Cynthia (2019). "Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead". Nature Machine Intelligence. 1 (5): 206–215. arXiv:1811.10154. doi:10.1038/s42256-019-0048-x. ISSN 2522-5839. PMC 9122117. PMID 35603010.

[Koh_Nguyen_Tang_Mussmann_Pierson_Kim_Liang_2020-20] Koh, P. W.; Nguyen, T.; Tang, Y. S.; Mussmann, S.; Pierson, E.; Kim, B.; Liang, P. (November 2020). "Concept bottleneck models". International Conference on Machine Learning. PMLR. pp. 5338–5348.

[Ludan_Lyu_Yang_Dugan_Yatskar_Callison-Burch_2023-21] Ludan, J. M.; Lyu, Q.; Yang, Y.; Dugan, L.; Yatskar, M.; Callison-Burch, C. (2023). "Interpretable-by-Design Text Classification with Iteratively Generated Concept Bottleneck". arXiv:2310.19660 [cs.CL].

[Wenninger_Kaymakci_Wiethe_2022_p=118300-22] Wenninger, Simon; Kaymakci, Can; Wiethe, Christian (2022). "Explainable long-term building energy consumption prediction using QLattice". Applied Energy. 308. Elsevier BV: 118300. Bibcode:2022ApEn..30818300W. doi:10.1016/j.apenergy.2021.118300. ISSN 0306-2619. S2CID 245428233.

[Christiansen_Wilstrup_Hedley_2022_p.-23] Christiansen, Michael; Wilstrup, Casper; Hedley, Paula L. (2022). "Explainable "white-box" machine learning is the way forward in preeclampsia screening". American Journal of Obstetrics and Gynecology. 227 (5). Elsevier BV: 791. doi:10.1016/j.ajog.2022.06.057. ISSN 0002-9378. PMID 35779588. S2CID 250160871.

[Wilstup_Cave_p.-24] Wilstup, Casper; Cave, Chris (2021-01-15), Combining symbolic regression with the Cox proportional hazards model improves prediction of heart failure deaths, Cold Spring Harbor Laboratory, doi:10.1101/2021.01.15.21249874, S2CID 231609904

[science-25] "How AI detectives are cracking open the black box of deep learning". Science. 5 July 2017. Retrieved 30 January 2018..

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]